WEBVTT

00:00.800 --> 00:05.340
Hello everybody and welcome back to our visualization session with Matt Platt lip.

00:05.450 --> 00:09.040
So in most cases the graph tells us more than a bunch of numbers.

00:09.050 --> 00:16.040
And for the human eye a graph is typically more intuitive and understandable and therefore also Python

00:16.040 --> 00:20.870
has an extensive library for viz. called met floodlit.

00:21.050 --> 00:24.700
So that actually no limits on customizing grass that's met floodlit.

00:24.800 --> 00:31.890
And yeah you could actually read em Zamir or read you any graph you might think often pacing and and

00:31.930 --> 00:38.760
yeah it's just a matter of time because the functionality of met plot level is more or less unlimited.

00:40.040 --> 00:46.820
So this session should serve as a short intro that enables you to make the most common plots like line

00:46.820 --> 00:53.570
plots scatter plots and his programs and it should also enable you to make the most common and important

00:53.570 --> 00:56.110
customizations to plots.

00:56.150 --> 01:02.810
So first of all we have to impart the plotting framework of bad plot loop called Pi plot so we import

01:02.810 --> 01:06.080
met plots that pi plot SPL T.

01:06.710 --> 01:12.650
So this is a stand up convention to have the revelation P to you.

01:12.650 --> 01:16.430
And if you work and tip in a notebook we should also have.

01:16.460 --> 01:18.770
The magic function map plot lib in line.

01:19.520 --> 01:21.260
So let's execute here.

01:22.420 --> 01:27.530
And then first of all let's create a list with five elements 1 3 5 7 and 9

01:30.170 --> 01:37.100
and we want to create a line plot of these values and therefore we're using the function period T dot

01:37.130 --> 01:45.620
plot for the line plot and we pass our list to a peer to you dot plot and before we want to execute

01:45.620 --> 01:53.030
and show our line plot we uh also showed you as pictured here dot so let's execute the cell here.

01:54.320 --> 01:56.800
So now here we can see our line plot.

01:56.900 --> 02:00.220
So when looking at the y axis we have one here.

02:00.230 --> 02:06.550
Then we have uh three we have five we have seven.

02:07.240 --> 02:14.500
And we have nine we could also change some the last element here too let's say six

02:18.290 --> 02:19.090
so you in the end.

02:19.090 --> 02:20.610
Then we have six.

02:20.830 --> 02:25.890
And by default peer to dot plot starts with the Sera at the x axis.

02:25.900 --> 02:30.420
So this is actually an interpolated line for the points 0 1

02:33.340 --> 02:37.780
1 3 2 5

02:41.350 --> 02:52.060
3 7 and 4 6 and we can also define x axis value so by default Python assumes integer starting from 0.

02:52.480 --> 02:55.310
And here in this case we have 5 elements.

02:55.320 --> 03:00.560
So from 0 to 5 excluding So this is the default case.

03:00.910 --> 03:09.510
So we get here the same graph but we can redefine the x axis so let's say we want to have x values from

03:09.510 --> 03:10.370
five to 10.

03:10.380 --> 03:13.830
We can do this.

03:14.060 --> 03:17.750
So now the Excel values are starting from five to nine.

03:18.650 --> 03:20.870
So on if we cross up to show

03:24.660 --> 03:31.320
then we can still see the plot here but also the FBI description of plot object and therefore it's best

03:31.320 --> 03:35.600
practice here to have the P.A. so before we plot here

03:38.530 --> 03:40.550
and we can also make a get up not to this.

03:40.570 --> 03:45.420
The function P T dot scatter where we define x value.

03:45.440 --> 03:50.800
So from 0 to 4 encoding and the y values is our list.

03:50.800 --> 03:53.230
So let's see what we get here.

03:53.650 --> 03:59.540
So we get your scatters and the first one has an x y you have zero and then y value of 1.

03:59.860 --> 04:08.370
And the second one has an x value of 1 and the y value of 3 and the third one has a mixed value of two

04:08.370 --> 04:09.590
and a y value of five.

04:09.630 --> 04:10.970
And so on.

04:11.050 --> 04:17.470
How why would we have to keep in mind that the amount of x values must match the length of our via values

04:17.490 --> 04:21.990
here Sophia for example X extend range 7.

04:21.990 --> 04:27.810
So with has seven elements of our sexes and only our list L with five elements.

04:27.840 --> 04:34.100
Then we get an error message so X and Y must be the same size

04:38.830 --> 04:39.220
all right.

04:39.220 --> 04:44.740
So let's move on to a histogram and let's assume that we are standing downtown New York at a street

04:44.770 --> 04:47.590
with a speed limit 50 miles per hour.

04:47.920 --> 04:51.260
And we are measuring the speed of 10000 cars.

04:51.610 --> 04:58.240
And for our purposes we assume that the speed is normally distributed was mean 55 and standard deviation

04:58.240 --> 04:58.810
5.

04:59.650 --> 05:08.080
So we are creating 10000 the normal distributed random numbers with mean fifty 55 and standard deviation

05:08.080 --> 05:08.610
5.

05:08.620 --> 05:15.250
So we are using here a list comprehension and we are assigning the driver speed in New York.

05:15.250 --> 05:18.090
So actually the length of our list should be ten thousand.

05:18.700 --> 05:18.900
Okay.

05:18.910 --> 05:21.770
First of all we have to import the random module.

05:21.790 --> 05:32.500
Then it should work and we can also check some elements here of our list speed New York.

05:32.610 --> 05:35.270
So let's say let's check.

05:35.290 --> 05:38.620
Uh the first 10 elements.

05:40.620 --> 05:41.130
So we uh.

05:41.130 --> 05:44.470
Fifty four fifty five point four and so on.

05:44.520 --> 05:45.470
Forty eight.

05:47.250 --> 05:51.170
And then we want to create a histogram With our 10000 measurements.

05:51.200 --> 05:58.530
And therefore we are using purity hist and we are passing our list speak New York to beauty list.

05:58.560 --> 06:02.170
And we are defining that we want to have 100 pins.

06:02.340 --> 06:05.120
So let's see what we get here.

06:06.150 --> 06:12.300
So here we have a histogram but our measurements so it's no surprise that s uh we created our measurements

06:12.330 --> 06:15.460
with a normal distributed random numbers generator.

06:15.840 --> 06:21.740
We get here kind of a battleship curve with um the mean at fifty five approximately.

06:22.320 --> 06:29.100
And we can also change the amount of pins here we have 100 pins and uh let's see what happens if we

06:29.100 --> 06:31.530
change it to 10.

06:31.590 --> 06:35.980
So now we have only 10 Ben so that's maybe not the perfect solution here.

06:36.150 --> 06:42.650
So maybe if we increase it to 1000 now this is maybe also not the best solution.

06:42.650 --> 06:47.930
So one hundred bins for 10000 measurements would be a good solution here.

06:48.140 --> 06:50.750
And now what we can do we can do many customizations.

06:50.750 --> 06:53.070
So first of all so let's plot again.

06:53.070 --> 07:00.470
Here we plot our histogram we are passing our list speed New York we are having 100 bins we are setting

07:00.470 --> 07:06.170
the table to a New York data we will see later what this will bring us and also we set out a four to

07:06.170 --> 07:09.290
1 and the color of our eyes again to read

07:12.140 --> 07:17.040
let's draw the graph here and it's essentially the same graph as above.

07:17.080 --> 07:19.690
Uh we only changed the color to red.

07:19.730 --> 07:19.880
Yeah.

07:19.940 --> 07:26.000
And then we can make your customization so first of all we can change in the style we use to see Bond

07:25.990 --> 07:26.390
style.

07:26.420 --> 07:31.940
So yeah there are many styles and Matt plotted and it's more or less a matter of taste but style you

07:31.940 --> 07:32.740
want to use.

07:33.920 --> 07:35.660
So if you change it to seaborne

07:39.210 --> 07:43.640
then it looks a bit different a bit more scientific actually.

07:44.160 --> 07:52.710
And we can also change the size of the figure by using the function Piers dot figure and set the fixed

07:52.710 --> 07:57.970
size to let's say 10 6.

07:58.030 --> 07:59.230
So now it's a bit larger.

07:59.230 --> 08:09.120
We can also change here let's say 212 six and we can also add a title to our graph with some functional

08:09.160 --> 08:15.430
purity job title and then we need to define the title as the strings here measure car speed speed limit

08:15.430 --> 08:16.250
50 miles.

08:16.290 --> 08:23.750
Power and what we can also do we can set the font size so the font size could be 15 year

08:27.680 --> 08:35.840
so it's a bit larger than the excelsior and then we can also introduce labels for the Xs so we can introduce

08:35.870 --> 08:44.770
um and label for the x axis with prototype X label and pass the string speed.

08:44.810 --> 08:46.130
So now here we have speed

08:49.140 --> 08:50.290
and uh the same.

08:50.310 --> 08:59.430
We can do with the VI label and for example we can set up the label occurrences number of occurrences

09:00.570 --> 09:05.370
here and we can also plot the vertical lines.

09:05.500 --> 09:10.040
That's the function peer to peer v lines.

09:10.100 --> 09:17.120
So and the purpose is to draw vertical line at the mean value of our collected our measured data.

09:17.450 --> 09:20.430
So yeah let's see a shift tab.

09:20.660 --> 09:22.580
What we need here.

09:22.640 --> 09:24.250
So we need the x value.

09:24.290 --> 09:27.680
And this is the mean value of our lists speed.

09:27.680 --> 09:28.160
New York.

09:28.160 --> 09:35.550
So we take the sum divided by the length and then we also have to define uh the minimum y value and

09:35.550 --> 09:36.870
the maximum y value.

09:36.870 --> 09:46.760
So here we can set it between 0 and 400 and then we can define that the colors of be red and the lines

09:46.780 --> 09:55.150
should be here Darst and the labels would be a mean New York so let's see what we got here

09:58.230 --> 10:05.270
right here you can see a we can see the test line but as our histogram is in transparent Yeah.

10:05.300 --> 10:10.760
We cannot see our dashed line here and therefore we have to change and the out file for our histogram

10:10.760 --> 10:15.650
so alpha equals 1 means there's no transparency of our histogram.

10:15.650 --> 10:24.240
So if you change it to let's say 0 point 5 0 is a gram here is getting transparent and then we are seeing

10:24.240 --> 10:26.240
the vertical line here.

10:26.430 --> 10:29.980
Next we can set the range of taxes.

10:30.120 --> 10:41.420
So here with POTUS access and let's see what we need here so we have to set X-Men X Max Wyman and Y

10:41.420 --> 10:41.840
Max.

10:41.840 --> 10:53.510
So why should be between let's say 0 and 400 and X should be the 3 and yeah 30 and maybe 90 so let's

10:53.510 --> 10:54.140
run here.

11:02.630 --> 11:06.190
And then we can also set the ticks off our axis.

11:06.230 --> 11:14.510
So to say here we have ticks at the 30 40 50 and if you want to take set let's say 30 35 45 then we

11:14.510 --> 11:17.280
have to change that here.

11:17.420 --> 11:26.360
So from 30 to 91 we want to have ticks at every fifth value.

11:26.970 --> 11:33.310
So here we are and the same we can do with them the VI label.

11:34.550 --> 11:37.970
So it makes sense to have some ticks at every 50s.

11:37.990 --> 11:41.970
No so we already have it here

11:49.500 --> 11:52.790
and then we can define if you want to have the grid here or not.

11:52.790 --> 12:03.570
And by default with them the Seabourn style it's enabled but we can also disable it with the force

12:10.110 --> 12:11.770
let's make it for Reagan.

12:15.990 --> 12:20.470
And before we head to our histogram we had the label the new york data.

12:20.490 --> 12:25.260
And then our vertical line the label mean New York.

12:25.260 --> 12:29.840
And to show the labels we have to call the function purity legend.

12:30.270 --> 12:35.740
And in this beauty legend we can define the location of our legend.

12:36.030 --> 12:41.820
So let's enable it here and let's press should tap

12:48.580 --> 12:51.180
so he allocation string so it can pass best.

12:51.190 --> 12:54.630
Or alternatively zero or upper right.

12:54.760 --> 13:03.710
This one here I have center right center right is seven and so on and we also define the font size equals

13:03.760 --> 13:04.840
13.

13:04.910 --> 13:06.470
So let's see what we get here.

13:08.540 --> 13:14.420
So now we have here our legend and we can also make many more plots within one graph.

13:14.450 --> 13:21.690
So let's assume we are going to Boston and also measure 10000 times them the speed in a street with

13:21.690 --> 13:23.240
the speed limit 50.

13:23.990 --> 13:31.220
So let's assume the Boston people are driving faster and on average Uh 60 miles per hour and then 50

13:31.220 --> 13:32.510
miles per hour zone.

13:32.510 --> 13:35.280
And the standard deviation eight.

13:35.450 --> 13:41.480
So let's create other random numbers and start in the list speed Boston.

13:43.010 --> 13:47.210
And then we can also plot the histogram here from Boston.

13:48.350 --> 13:52.280
So Patty Hearst we passed speed Boston.

13:52.310 --> 13:54.840
We set pins to a 100.

13:55.200 --> 13:57.220
Set the label to Boston data.

13:58.050 --> 13:59.840
Alpha equals 0 point 5.

13:59.850 --> 14:01.900
And the color to blue.

14:01.920 --> 14:07.970
And let's see what happens.

14:08.000 --> 14:11.450
Here we have both a histogram some New York and from Boston.

14:11.450 --> 14:12.200
And one graph.

14:12.220 --> 14:19.070
And it makes it very easy to compare and we can also enable here the vertical line at the mean of the

14:19.070 --> 14:20.190
Boston data.

14:22.360 --> 14:22.490
Yeah.

14:22.830 --> 14:28.070
And here we can see on average I mean the Boston people are driving faster than the New York people.

14:28.910 --> 14:33.080
And also with the variability of the Boston data is a bit higher.

14:33.080 --> 14:39.140
So it's a bit more diverse here because of the standard deviation of our Boston data was higher with

14:39.170 --> 14:40.610
eight.

14:40.650 --> 14:41.120
All right.

14:41.130 --> 14:44.490
We are finished now with our shot introduction to Matt Black live.

14:44.490 --> 14:49.980
And if you want to go deeper into details I recommend you to go on the documentation side

14:54.630 --> 14:57.800
so here you can find many examples and tutorials.

14:57.810 --> 15:06.820
And if we have a look at the examples so you can see here the options are actually unlimited.

15:08.770 --> 15:12.310
So there are hundreds or thousands of examples here

15:15.740 --> 15:20.370
so have fun if you like and I hope you enjoyed the session and outtake take by.
